AITopics | imputation method

Unsupervised Anomaly Detection in The Presence of Missing Values

Neural Information Processing SystemsApr-30-2026, 18:10:21 GMT

Anomaly detection methods typically require fully observed data for model training and inference and cannot handle incomplete data, while the missing data problem is pervasive in science and engineering, leading to challenges in many important applications such as abnormal user detection in recommendation systems and novel or anomalous cell detection in bioinformatics, where the missing rates can be higher than 30\% or even 80\%. In this work, first, we construct and evaluate a straightforward strategy, ''impute-then-detect'', via combining state-of-the-art imputation methods with unsupervised anomaly detection methods, where the training data are composed of normal samples only. We observe that such two-stage methods frequently yield imputation bias from normal data, namely, the imputation methods are inclined to make incomplete samples ''normal, where the fundamental reason is that the imputation models learned only on normal data and cannot generalize well to abnormal data in the inference stage. To address this challenge, we propose an end-to-end method that integrates data imputation with anomaly detection into a unified optimization problem. The proposed model learns to generate well-designed pseudo-abnormal samples to mitigate the imputation bias and ensure the discrimination ability of both the imputation and detection processes. Furthermore, we provide theoretical guarantees for the effectiveness of the proposed method, proving that the proposed method can correctly detect anomalies with high probability. Experimental results on datasets with manually constructed missing values and inherent missing values demonstrate that our proposed method effectively mitigates the imputation bias and surpasses the baseline methods significantly.

artificial intelligence, data mining, proceedings, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)

Add feedback

17f158c25b08758cf650130f7f173e51-Paper-Conference.pdf

Neural Information Processing SystemsApr-25-2026, 08:30:56 GMT

artificial intelligence, data mining, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science > Data Mining (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.71)

Add feedback

206361867abf7eb01746c3943078da3c-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-25-2026, 01:15:30 GMT

artificial intelligence, international conference, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
(4 more...)

Add feedback

206361867abf7eb01746c3943078da3c-Paper-Conference.pdf

Neural Information Processing SystemsApr-25-2026, 01:15:26 GMT

artificial intelligence, international conference, machine learning, (14 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre: Research Report > Experimental Study (0.93)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)

Add feedback

A Large-Scale Comparative Analysis of Imputation Methods for Single-Cell RNA Sequencing Data

Iwashita, Yuichiro, Abbasi, Ahtisham Fazeel, Kise, Koichi, Dengel, Andreas, Asim, Muhammad Nabeel

arXiv.org Machine LearningApr-15-2026

Background: Single-cell RNA sequencing (scRNA-seq) enables gene expression profiling at cellular resolution but is inherently affected by sparsity caused by dropout events, where expressed genes are recorded as zeros due to technical limitations. These artifacts distort gene expression distributions and compromise downstream analyses. Numerous imputation methods have been proposed to recover latent transcriptional signals. These methods range from traditional statistical models to deep learning (DL)-based methods. However, their comparative performance remains unclear, as existing benchmarks evaluate only a limited subset of methods, datasets, and downstream analyses. Results: We present a comprehensive benchmark of 15 scRNA-seq imputation methods spanning 7 methodological categories, including traditional and DL-based methods. Methods are evaluated across 30 datasets from 10 experimental protocols on 6 downstream analyses. Results show that traditional methods, such as model-based, smoothing-based, and low-rank matrix-based methods, generally outperform DL-based methods, including diffusion-based, GAN-based, GNN-based, and autoencoder-based methods. In addition, strong performance in numerical gene expression recovery does not necessarily translate into improved biological interpretability in downstream analyses, including cell clustering, differential expression analysis, marker gene analysis, trajectory analysis, and cell type annotation. Furthermore, method performance varies substantially across datasets, protocols, and downstream analyses, with no single method consistently outperforming others. Conclusions: Our findings provide practical guidance for selecting imputation methods tailored to specific analytical objectives and underscore the importance of task-specific evaluation when assessing imputation performance in scRNA-seq data analysis.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Machine Learning

2603.24626

Country:

Europe > Germany > Rhineland-Palatinate > Kaiserslautern (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > Netherlands > South Holland > Leiden (0.04)
(4 more...)

Genre: Research Report > New Finding (0.86)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Oncology (0.67)
Health & Medicine > Therapeutic Area > Immunology (0.67)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback

tBayes-MICE: A Bayesian Approach to Multiple Imputation for Time Series Data

Ibenegbu, Amuche, de Micheaux, Pierre Lafaye, Chandra, Rohitash

arXiv.org Machine LearningApr-10-2026

Time-series analysis is often affected by missing data, a common problem across several fields, including healthcare and environmental monitoring. Multiple Imputation by Chained Equations (MICE) has been prominent for imputing missing values through "fully conditional specification". We extend MICE using the Bayesian framework (tBayes-MICE), utilising Bayesian inference to impute missing values via Markov Chain Monte Carlo (MCMC) sampling to account for uncertainty in MICE model parameters and imputed values. We also include temporally informed initialisation and time-lagged features in the model to respect the sequential nature of time-series data. We evaluate the tBayes-MICE method using two real-world datasets (AirQuality and PhysioNet), and using both the Random Walk Metropolis (RWM) and the Metropolis-Adjusted Langevin Algorithm (MALA) samplers. Our results demonstrate that tBayes-MICE reduces imputation errors relative to the baseline methods over all variables and accounts for uncertainty in the imputation process, thereby providing a more accurate measure of imputation error. We also found that MALA mixed better than RWM across most variables, achieving comparable accuracy while providing more consistent posterior exploration. Overall, these findings suggest that the tBayes-MICE framework represents a practical and efficient approach to time-series imputation, balancing increased accuracy with meaningful quantification of uncertainty in various environmental and clinical settings.

artificial intelligence, imputation, machine learning, (19 more...)

arXiv.org Machine Learning

2603.27142

Country:

North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
Oceania > Australia > New South Wales (0.04)
Europe > Italy (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.88)

Add feedback

Generative Modeling under Non-Monotonic MAR Missingness via Approximate Wasserstein Gradient Flows

Kremling, Gitte, Näf, Jeffrey, Lederer, Johannes

arXiv.org Machine LearningApr-7-2026

The prevalence of missing values in data science poses a substantial risk to any further analyses. Despite a wealth of research, principled nonparametric methods to deal with general non-monotone missingness are still scarce. Instead, ad-hoc imputation methods are often used, for which it remains unclear whether the correct distribution can be recovered. In this paper, we propose FLOWGEM, a principled iterative method for generating a complete dataset from a dataset with values Missing at Random (MAR). Motivated by convergence results of the ignoring maximum likelihood estimator, our approach minimizes the expected Kullback-Leibler (KL) divergence between the observed data distribution and the distribution of the generated sample over different missingness patterns. To minimize the KL divergence, we employ a discretized particle evolution of the corresponding Wasserstein Gradient Flow, where the velocity field is approximated using a local linear estimator of the density ratio. This construction yields a data generation scheme that iteratively transports an initial particle ensemble toward the target distribution. Simulation studies and real-data benchmarks demonstrate that FLOWGEM achieves state-of-the-art performance across a range of settings, including the challenging case of non-monotonic MAR mechanisms. Together, these results position FLOWGEM as a principled and practical alternative to existing imputation methods, and a decisive step towards closing the gap between theoretical rigor and empirical performance.

artificial intelligence, bayesian inference, machine learning, (18 more...)

arXiv.org Machine Learning

2604.04567

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.66)

Add feedback

Task-oriented Time Series Imputation Evaluation via Generalized Representers Zhixian Wang 1,2, Linxiao Y ang

Neural Information Processing SystemsFeb-18-2026, 18:25:08 GMT

Specifically, we summarize our contributions into the following points.

artificial intelligence, data mining, machine learning, (18 more...)

Neural Information Processing Systems

Country:

North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
Asia > China > Hong Kong (0.04)
North America > United States > California (0.04)
Europe > France (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry:

Energy > Power Industry (0.93)
Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science > Data Mining (0.94)
(3 more...)

Add feedback

Multivariate Time Series Imputation with Generative Adversarial Networks

Yonghong Luo, Xiangrui Cai, Ying ZHANG, Jun Xu, Yuan xiaojie

Neural Information Processing SystemsFeb-13-2026, 19:53:28 GMT

Neural Information Processing Systems http://nips.cc/

dataset, imputation method, time sery, (13 more...)

Neural Information Processing Systems

Country:

Asia > China > Tianjin Province > Tianjin (0.05)
Asia > China > Beijing > Beijing (0.04)
North America > United States > Massachusetts > Middlesex County > Natick (0.04)
North America > Canada > Quebec > Montreal (0.04)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Signals PubliclyAvailable Data Realistic Missingness DirectlyEvaluates Imputation

Neural Information Processing SystemsFeb-11-2026, 07:20:38 GMT

The promise of mobile health (mHealth) is the ability to use wearable sensors to monitor participant physiology athigh frequencies during daily life to enable temporally-precise health interventions.

artificial intelligence, data quality, machine learning, (14 more...)

Neural Information Processing Systems

Country: North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (0.46)

Industry: